6 research outputs found

    Look at the First Sentence: Position Bias in Question Answering

    Full text link
    Many extractive question answering models are trained to predict start and end positions of answers. The choice of predicting answers as positions is mainly due to its simplicity and effectiveness. In this study, we hypothesize that when the distribution of the answer positions is highly skewed in the training set (e.g., answers lie only in the k-th sentence of each passage), QA models predicting answers as positions can learn spurious positional cues and fail to give answers in different positions. We first illustrate this position bias in popular extractive QA models such as BiDAF and BERT and thoroughly examine how position bias propagates through each layer of BERT. To safely deliver position information without position bias, we train models with various de-biasing methods including entropy regularization and bias ensembling. Among them, we found that using the prior distribution of answer positions as a bias model is very effective at reducing position bias, recovering the performance of BERT from 37.48% to 81.64% when trained on a biased SQuAD dataset.Comment: 13 pages, EMNLP 202

    Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models

    Full text link
    Questions in open-domain question answering are often ambiguous, allowing multiple interpretations. One approach to handling them is to identify all possible interpretations of the ambiguous question (AQ) and to generate a long-form answer addressing them all, as suggested by Stelmakh et al., (2022). While it provides a comprehensive response without bothering the user for clarification, considering multiple dimensions of ambiguity and gathering corresponding knowledge remains a challenge. To cope with the challenge, we propose a novel framework, Tree of Clarifications (ToC): It recursively constructs a tree of disambiguations for the AQ -- via few-shot prompting leveraging external knowledge -- and uses it to generate a long-form answer. ToC outperforms existing baselines on ASQA in a few-shot setup across the metrics, while surpassing fully-supervised baselines trained on the whole training set in terms of Disambig-F1 and Disambig-ROUGE. Code is available at https://github.com/gankim/tree-of-clarifications.Comment: Accepted to EMNLP 202

    KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for Radiology Report Summarization

    Full text link
    In this paper, we introduce CheXOFA, a new pre-trained vision-language model (VLM) for the chest X-ray domain. Our model is initially pre-trained on various multimodal datasets within the general domain before being transferred to the chest X-ray domain. Following a prominent VLM, we unify various domain-specific tasks into a simple sequence-to-sequence schema. It enables the model to effectively learn the required knowledge and skills from limited resources in the domain. Demonstrating superior performance on the benchmark datasets provided by the BioNLP shared task, our model benefits from its training across multiple tasks and domains. With subtle techniques including ensemble and factual calibration, our system achieves first place on the RadSum23 leaderboard for the hidden test set.Comment: Published at BioNLP workshop @ ACL 202

    Towards More Realistic Generation of Information-Seeking Conversations

    Full text link
    In this paper, we introduce a novel framework SimSeek (simulating information-seeking conversation from unlabeled documents) and compare two variants of it to provide a deeper perspective into the information-seeking behavior. We first introduce a strong simulator for information-symmetric conversation, SimSeek-sym, where questioner and answerer share all knowledge when conversing with one another. Although it simulates reasonable conversations, we take a further step toward more realistic information-seeking conversation. Hence, we propose SimSeek-asym that assumes information asymmetry between two agents, which encourages the questioner to seek new information from an inaccessible document. In our experiments, we demonstrate that SimSeek-asym successfully generates information-seeking conversations for two downstream tasks, CQA and conversational search. In particular, SimSeek-asym improves baseline models by 1.1-1.9 F1 score in QuAC, and by 1.1 of MRR in OR-QuAC. Moreover, we thoroughly analyze our synthetic datasets to identify crucial factors for realistic information-seeking conversation.Comment: 10 pages preprin
    corecore